-
-
Notifications
You must be signed in to change notification settings - Fork 144
Fix #691: Add file locking to local results file saves to prevent race conditions #713
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Fix #691: Add file locking to local results file saves to prevent race conditions #713
Conversation
…nt race conditions - Wrap board.save(append=True) with file_lock() in _save() method - Use the same timeout as global saves (global_lock_timeout config) - Add comprehensive error handling for lock timeout scenarios - Add test to verify parallel saves work correctly without data loss This prevents data loss when multiple processes write to the same local results.csv file simultaneously in parallel execution scenarios.
for more information, see https://pre-commit.ci
WalkthroughAdds a file lock around the local results CSV write to prevent concurrent-process append races; lock acquisition includes a timeout and logs exceptions on timeout while allowing the process to continue saving the global results. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Proc as Process
participant FS as Local FS (results.csv)
participant Lock as FileLock
participant Global as Global Scoreboard
Proc->>Lock: acquire(results.csv) with timeout
alt lock acquired
Lock->>FS: append row
FS-->>Lock: ack
Lock->>Proc: release
Proc->>Global: save global board
Global-->>Proc: ack
else timeout
Lock-->>Proc: TimeoutError
Proc->>Proc: log exception (timeout)
Proc->>Global: save global board (continue)
Global-->>Proc: ack
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧬 Code graph analysis (1)tests/unit/amlb/test_results_race_condition.py (1)
🪛 Ruff (0.14.5)tests/unit/amlb/test_results_race_condition.py113-113: Unused function argument: (ARG001) 113-113: Unused function argument: (ARG001) 114-114: Avoid specifying long messages outside the exception class (TRY003) Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
amlb/benchmark.py(1 hunks)tests/unit/amlb/test_results_race_condition.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
tests/unit/amlb/test_results_race_condition.py (2)
amlb/results.py (5)
Scoreboard(82-349)append(312-327)save(308-310)all(86-87)as_data_frame(189-237)amlb/utils/process.py (1)
file_lock(35-49)
amlb/benchmark.py (3)
amlb/data.py (1)
path(120-121)amlb/results.py (3)
path(330-349)save(308-310)append(312-327)amlb/utils/process.py (1)
file_lock(35-49)
🪛 Ruff (0.14.5)
tests/unit/amlb/test_results_race_condition.py
115-115: Local variable original_file_lock is assigned to but never used
Remove assignment to unused variable original_file_lock
(F841)
117-117: Unused function argument: args
(ARG001)
117-117: Unused function argument: kwargs
(ARG001)
118-118: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (1)
amlb/benchmark.py (1)
474-489: Local results locking matches the intended fix; just confirm the timeout exception mappingThe new
_saveimplementation correctly wraps the localboard.save(append=True)infile_lock(local_path, timeout=rconfig().results.global_lock_timeout), mirroring the existing global lock behavior and addressing the race condition described in #691. Swallowing onlyTimeoutErrorand proceeding to_save_global(board)is a reasonable trade‑off: the run is still reflected in the global board, while the log message clearly calls out that local results may be incomplete.One thing to double‑check: ensure that the
filelockversion you depend on still raises an exception type that subclassesTimeoutErrorfor lock‑acquisition failures so that thisexcept TimeoutError:block is reliably triggered. If that ever changes, you may want to catch the library’s specific timeout class instead.
Description
This PR fixes issue #691 by adding file locking to local results file writes, preventing race conditions when multiple processes write to the same
results.csvfile simultaneously.Problem
When running AMLB in local mode in parallel (e.g., on a cluster or large machine with NFS), multiple processes could try to append to the same local file (
session_dir/scores/results.csv) at the same time. This caused one of the append operations to be dropped, resulting in data loss.The global results file was already protected with file locking, but the local results file was not.
Solution
file_lock()protection aroundboard.save(append=True)in the_save()methodglobal_lock_timeoutconfiguration (default 5 seconds)Changes
_save()method to wrap local file save with file lockingTesting
The test simulates 10 parallel saves to the same results file and verifies:
Fixes #691
Summary by CodeRabbit
Bug Fixes
Tests